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Abstract: We present three Virtual Observatory tools developed at the ATNF for the storage, pro- 
cessing and visualisation of ATCA data. These are the Australia Telescope Online Archive, a prototype 
data reduction pipeline, and the Remote Visualisation System. These tools were developed in the con- 
text of the Virtual Observatory and were intended to be both useful for astronomers and technology 
demonstrators. We discuss the design and implementation of these tools, as well as issues that should 
be considered when developing similar systems for future telescopes. 
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1 Introduction 

The so-called data explosion in astronomy promises ex- 
citing new scientific developments, but brings with it 
many technical challenges, in collecting, storing, trans- 
porting, processing and visualising data. Virtual Ob- 
servatories (VO) have developed to meet some of these 
technical challenges. Falling under the broad area of 
e-science (which incorporates other scientific domains 
facing similar challenges, such as genetics and particle 
physics) the aim of Virtual Observatory research is to 
provide the tools necessary for dealing with this data. 

The Australian Virtual Observatory (Aus-VO)^ was 
started in 2003 with the aim of both contributing to 
the international VO effort, and developing tools of 
use to Australian astronomers. Australia has many 
areas of expertise (for example radio astronomy) and 
it makes sense to focus our efforts on providing mod- 
ern tools for working in these areas. In this context 
the ATNF decided to develop a range of tools for stor- 
ing, processing and visualising data from the Australia 
Telescope Compact Array. The aim was that these 
tools would be useful to astronomers now, and at the 
same time let us explore the technology that would be 
necessary for developing software for future telescopes 
such as the Square Kilometre Array (SKA). 

This paper is based on a talk given at the ASA 
Annual Meeting in Sydney in July 2005. After giv- 
ing some background about the ATCA and the Vir- 
tual Observatory, we discuss three tools developed at 
the ATNF over the last few years. Firstly the Aus- 
tralia Telescope Online Archive which contains all of 
the data collected so far by the ATCA; secondly a pro- 
totype data reduction pipeline for ATCA data; and 
finally the Remote Visualisation System for viewing 
large datasets. 



^http : //www . aus-vo ■ org] 



2 The ATCA 

The Australia Telescope Compact Array (ATCA) is an 
east-west earth-rotation synthesis interferometer, with 
six 22 m antennas on a 6 km baseline. It has been in 
operation at Narrabri since 1990. The telescope can 
observe at 6 bands with wavelengths 20 cm, 12 cm, 6 
cm, 3 cm, 1 cm and 3 mm. Each antenna observes two 
frequencies simultaneously. There are six bandwidths 
available on Frequency 1 (128, 64, 32, 16, 8 and 4 
MHz) and two bandwidths on Frequency 2 (128 and 
64 MHz). The telescope produces ~ 0.5 GB of raw 
data per day, and this is likely to increase significantly 
with future telescope upgrades. 

In the rest of this section we outline the exist- 
ing systems for archiving, processing and visualising 
ATCA data. 

2.1 Data archiving 

Since the commencement of operation of the ATCA in 
June 1990, a complete record of all data observed from 
the telescope has been maintained offline at the tele- 
scope site, mostly recorded on CD. In conjunction with 
this, the ATNF maintained a record of the project pro- 
posals for observations on the telescope - the Projects 
database — and a short form of the observation param- 
eters for each days observing - the Positions database. 
After a proprietary period of 18 months, in which the 
observing team has sole access to the data obtained 
in an observation, the data is made publicly available. 
Astronomers can search for observations on the ATNF 
webpage, submit the details of the observation data 
required via e-mail and have a CD containing the data 
prepared for them at nominal cost. 

2.2 Data processing 

ATCA Data processing (reduction) is generally per- 
formed with one of the standard ra dio data reductio n 
packages; most commonly Miriad llSault et alJll99S) . 
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but also AIPS^ and AIPS++^. After loading, editing 
and calibrating the data, the resulting product is an 
intensity map referred to as the dirty image. At this 
stage a deconvolution algorithm, usually a variant of 
CLEAN llHogbomll974f) , is required to produce the final 
image. 

At each stage in the process there are a range of 
parameters that can be set to control the type of pro- 
cessing performed. Both general parameters (such as 
calibration strategy, CLEAN method and type of data 
editing) as well as fine-grained parameters (such as cal- 
ibration solution interval, number of CLEAN iterations 
and median filter size) need to be modified to obtain 
the best results. Hence data processing is typically a 
highly interactive process. 

There is an existing system operating at the tele- 
scope CAONIS designed for on-the-fiy imaging of ATCA 
data. However the design of this system makes it dif- 
ficult to port to current Linux systems. 

2.3 Data visualisation 

The final images from the ATCA are usually visu - 
alised using tools such as Miriad or kvis l|Goochll99fil V 
These are well established tools that cover many of 
the visualisation requirements of ATCA observers. As 
mentioned in the previous section, the CAONIS system 
which runs at Narrabri also allows basic visualisation 
of images. 

3 The Virtual Observatory 

The umbrella organisation for Virtual Observatory work 
is the International Virtual Observatory Alliance (IVOA) . 
The IVOA was formed in June 2002 with a mission to 

facilitate the international coordination and 
collaboration necessary for the development 
and deployment of the tools, systems and 
organizational structures necessary to en- 
able the international utilization of astro- 
nomical archives as an integrated and in- 
teroperating virtual observatory. 

The IVOA is a collaboration between over 15 mem- 
ber countries including Australia. The focus so far has 
been developing the standards required for interoper- 
ability between software developed and data produced 
in all areas of astronomy. Another significant aim is to 
develop the infrastructure required (networks and or- 
ganisations) for the large scale storage and distribution 
of astronomical data. 

The IVOA working groups address a range of is- 
sues such as grid and web services, data modelling and 
standards for the data access. There are also four in- 
terest groups 

• Applications IG 

• Astronomy Grid IG 

^ ^Astronomical Image Processing System (AIPS), 
|http:/ /www. cv.nrao.edu/aips 

Astronomical Image Processing System (AIPSH — h), 
|http: / /aips2.nrao.edu| 



• Data Curation IG 

• Theory IG 

which focus on the requirements of particular applica- 
tion domains. 

The aim of the Australian Virtual Observatory (Aus- 
VO) is to provide distributed, uniform interfaces to the 
data archives of Australia's major observatories and 
the archives of simulation data. Aus-VO is a collab- 
oration between many Australian institutions, includ- 
ing the Universities of Melbourne, Sydney, New South 
Wales and Queensland, Monash University, Swinburne 
University of Technology, the Australian National Uni- 
versity and Mount Stromlo Observatory, the Victorian 
Partnership for Advanced Computing, the ATNF and 
the A AO. 

There are a range of VO projects underway in 
Australia, including the de velopm ent of data archives 
and software for HIPASS (Mover ct al. 2004"), RAVE 
(Siobcrt 20 04), 2Q Z (£rpom ct al, 2004} and SUMSS 
(Bock ct al] ll999r) . The initial focus of most of these 
projects has been to make data from Australian projects 
widely available within the international community, 
in a VO compliant format. In addition there are sev- 
eral projects investigating novel methods for astro- 
nomical data mini ng and data analysis, for example 
iR.ohde et alJ lj200!Tri apply machine learning techniques 
to catalogue crossmatching. The Melbourne Univer- 
sity group has also been setting up infrastructure such 
as a registry for Australian web services and data archives. 

4 The Austraha Telescope 
Online Archive 

In June 2003, a joint project between the ATNF and 
the CSIRO ICT Centre was commenced to make the 
ATNF archive data available online as the Australia 
Telescope Online Archive (ATOA). This was planned 
as a new data resource for astronomers, as well as the 
foundation for the development of online data process- 
ing systems to make the raw data more accessible to 
non-expert users (see Section 1^. The construction of 
the ATOA required the copying of the offline archive 
(at the time, ~ 2700 CDs, totalling ~ 1.7 TB) from the 
telescope site to Canberra where the online archive was 
to be developed, creating a meta-database describing 
the data, and making a web front-end to search and 
download the data. 

The database consists of two parts. The first is the 
raw data from the telescope (RPFITS files) which is 
stored as normal files on the host system. In addition 
there is a relational database which stores all the meta- 
data (discussed in Section 14.1^ . The 'vital statistics' 
of the ATOA are shown in Table The current rate 
of growth of the archive is ~ 0.5 Gb/day. However 
this is likely to increase significantly in the future as 
new instruments come online. To maintain an grow- 
ing archive (rather than a static one) it is necessary to 
ensure the RPFITS files are stored in a readily acces- 
sible way (currently on a RAID system) that is easily 
distributed over a number of drives. Also, that the 
database itself is easy to update in a robust manner. 
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Table 1: ATOA Statistics 



The ATOA was made publicl y available in December 4.2 A data model for the ATCA 
2004 and can be accessed from |http: //atoa. atnf . csiro . au | 

A data model is a comprehensive scheme describing 
how data is to be represented, for manipulation by 
humans or computer programs. Data models are crit- 
ical for planning how data will be organised within a 
database as they describe all the relationships between 
the different entities. 

A section of our data model for the ATCA is shown 
in Figure We now briefly explain the UML (Unified 
Modelling Language) notation used in the data model. 
Each box contains an entity (e.g. Scan) that has been 
identified in the metadata. Each entity has attributes 
(e.g. restFreq), each of which are of a specified data 
type (e.g. float). Associated entities are connected 
to each other with lines, which also specify the cardi- 
nality of the relationship. For example 



Projects 

Files 

Sources 

Metadata size 

RPFITS data size 

Growth rate 



2261 
57147 
128111 

- 4 Gb 

- 2 Tb 
0.5 Gb/day 



4.1 Metadata 

Metadata is simply data which describes other data, 
for example the project code or the name of the pri- 
mary calibrator. The meta-database for the ATOA 
consists of three main parts; the contents of the origi- 
nal ATNF online Projects database, metadata describ- 
ing the observation that is extracted directly from the 
raw data files produced by the telescopes software, and 
metadata inferred from all of the available data sources 
to assist in the automation of reducing the telescopes 
raw data to images. The types of metadata used in 
the ATOA are summarised in Table |2| 



Table 2: Metadata in the ATOA 



Metadata source 



Examples 



Projects database 



Positions database* 



RPFITS files 



Inferred 



proposal; observer name 
country; institution 
source names & positions 
observing band; receivers 
scans; polarisations 
array configuration 
calibrator names & roles 
calibrator-target matches 



* The positions database is included in our data 
model, and some of the metadata is used to 
reconstruct the observation metadata. However, 
it is not actually loaded into the ATOA. 



Most of the metadata available in the ATNF Po- 
sitions database is also available from the metadata 
in the raw data files, and is finer-grained, since the 
Positions data is a daily summary, while the file meta- 
data is available for each telescope pointing. The in- 
ferred metadata in the ATOA is 'value added' infor- 
mation that is automatically determined from the ex- 
isting metadata, for example the calibration role of 
each source (primary calibrator, secondary calibrator, 
target, etc). This is discussed further in Section [5.21 



Scan 




SPW 


1..* 0..* 



should be read as "A scan has or more spectral 
windows. A spectral windows has 1 or more 
scans." 

The development of a data model that covers the 
whole of astronomy is an ongoing project within the in- 
ternational VO community. We have contributed this 
data model to the IVOA Data Model WG as an ex- 
ample of a data model for radio astronomy. For more 
information on this topic, see the IVOA Data Mod- 
elling website^. 

The ATOA archive database structure is created 
directly from the definitions in the ATOA data model. 
Parts of the data model contain information for specific 
database implementations so that all of the implementation- 
specific parts of the database creation are handled in 
this process. The data model in Figure corresponds 
to the part of the data model that describes the meta- 
data contained directly in the archive RPFITS files. 
The data model for the inferred data is available from 
the ATOA web pages^. 



4.3 Implementation 

The ATOA web interface was implemented as a Java 
(ver. 1.4.2)'' application and is hosted using the Apache 
Tomcat (ver. 5.0.28)'^ web container. Relational database 
services are provided by an Oracle 9i instance running 
on the same machine. A web based interface was cho- 
sen so as to maximise interoperability and provide easy 
access to users. For example, RPFITS files may sim- 
ply be downloaded by constructing a suitable URL for 
the ATOA file server. This allows files on the server 
to be downloaded by a Web browser, by command-line 
programs that allow users to fetch the data referred to 
by a URL, or by application programs using libraries 
that allow a URL to be opened in a similar way to a 
file on a local file system. 



^ http : //www. ivoa ■ net/twlki/bln/view/I VOA/ IvoaPataMod eT] 

" http : //www. atnf . csiro . au/c omput ing/web/ at o a/ impl eme nt at i on ■ html] 

^ http : //Java. sun. com 

'|http : //tomcat . apache . org| 
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The user interface centres around two main web 
pages: the query page which allows users to specify 
criteria for selecting RPFITS files from the archive, 
and a results page which provides the means for users 
to inspect the metadata of matching files and download 
particular files if desired. The results page initially 
presents the user with a broad, global view of the query 
results in tabular form listing details such as file name, 
file size, principal investigator and array configuration. 
The user may also interactively 'drill-down' for a more 
detailed view of any file in the list. RPFITS files can 
be downloaded individually or in batches. 

As mentioned in Section r2.1l ATCA data has a pro- 
prietary period of 18 months, in which it is only ac- 
cessible to members of the project team. Authorised 
access is supported for data within the proprietary ac- 
cess period. If a user wishes to access proprietary data 
they must first go through a manually verified authen- 
tication process after which a password is issued to the 
Principal Investigator for that project. In the future 
we plan to replace this authentication and authorisa- 
tion method with a streamlined system that links the 
new ATNF proposal system, OPAL*, and its authenti- 
cation database to the ATOA. Users will then be given 
access to proprietary data by using their OPAL creden- 
tials based on the projects they are associated with in 
the OPAL system 

The ATOA web server and database are hosted on 
a Dell PowerEdge 750 running Debian Linux 3.0. The 
host has a 2.8GHz Pentium 4 processor, 2GB of RAM 
and is attached to a 3 terabyte Apple Xserve RAID for 
archive storage. 

5 A data processing pipeline 
framework 

The data products in radio astronomy are often less 
accessible to the non-expert than those in other do- 
mains such as optical astronomy. It requires a reason- 
ably high level of domain expertise to process the raw 
data and produce an image. Obviously for carrying 
out detailed scientific analysis it would be necessary 
to develop this expertise, or collaborate with a radio 
astronomer. However in an era of multiwavelength as- 
tronomy, astronomers expect to download and com- 
pare data from a variety of telescopes, at a variety of 
wavelengths. 

With this in mind we have developed an automatic 
pipeline for people who want to quickly inspect the 
data in the ATOA, to see if it was suitable for fur- 
ther processing. One of the aims of this project was 
to test the viability of 'driving' the pipeline using the 
metadata discussed in Section 14.11 In other words 
the pipeline should make decisions about what kind 
of processing to do — both on a general (e.g. contin- 
uum/spectral line) and specific level (e.g. number of 
CLEAN iterations). 

In this section we discuss the development of extra 
metadata required to driving the pipeline, in particular 
the calibration process. We then outline our prototype 

* http : //opal . atnf . csiro . au/j 



pipeline which can process single pointing continuum 
data from the ATOA and is available for testing at 
jhttp : //atoa ■ atnf . csiro . au/testj 

5.1 Metadata for automatic 
processing 

The metadata in the Project and Position databases, 
while providing information about which astronomical 
sources have been recorded in an observing session, 
does not (in general) provide any information about 
the role that the observer intended the source to play in 
the observation (eg. primary calibrator, target source) . 
This would be relatively easy to record in a new sys- 
tem, but as we are dealing with existing data we had 
to infer the roles of sources. 

Another problem for automatic processing is the 
grouping of data into valid 'observations'. An expert 
would typically choose an appropriate subset of files 
from the archive to image. However, a non-radio as- 
tronomer may choose an subset that contains files that 
should be imaged separately, or files that contain data 
that should be ignored entirely. Although it is impos- 
sible to deal with all cases, our aim was to have the 
pipeline group together the selected data in such a way 
that an image could be made in at least 80% of cases. 
A wide range of observation types can be recognised 
and characterised using the meta-database but are not 
yet processed by the prototype pipeline (e.g. millime- 
tre and spectral line observations). 

In the following section we discuss how we assign 
the source roles within an observation, and the algo- 
rithm we used to match target sources with the appro- 
priate calibrators. 

5.2 Determining source roles 

While matching target sources with their calibrators 
would be straightforward for an astronomer it is a chal- 
lenge for an automatic system. In a typical (simple) 
observing session the primary calibrator is recorded 
for a short period at the start or end of the observing 
session; and alternating pointings are made to the sec- 
ondary, and the source of interest, or target. However 
there is a great variety of different ways that the ob- 
server can choose to structure their observations. If an 
observation contains more than one target, the targets 
may share, or have distinct, secondary calibrators, de- 
pending on their separation in the sky. There may be 
several secondary calibrators for each target, and the 
same source may be used for primary and secondary 
calibration. In addition, some observers use secondary 
calibrators that are not in the list of recommended cal- 
ibrators, and that list has itself changed over time. 

In order to classify the sources in an observing ses- 
sion the following metadata is used 

• The locations and names of sources extracted 
from the raw telescope data 

• The times and durations of the source pointings 

• The names and locations of the four primary 
calibrators commonly used at the ATCA 
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• A recent ATCA catalogue of recommended sec- 
ondary calibrators 

• Names of sources extracted from project titles 

• A pre-assembled list of possible calibrator sources 

Once the source roles have been determined, the 
proximity in the sky and the proximity in observation 
time of the targets and their secondary calibrators are 
used to match targets with their respective calibra- 
tor(s). For each target pointing, a weight is calculated 
for each secondary calibration pointing made within 
two hours of observation of the target pointing; 



-(3At/At„ 



Att.s < At 

max 

where S is the set of candidate secondary calibrators, 
a is the angular separation between the target and the 
secondary calibrator, Umax is the maximum desirable 
separation between the target and the secondary cal- 
ibrator (and is a function of the observing frequency 
band). At is the separation of the time midpoints of 
the target and calibrator pointings and Atmax is the 
maximum desirable time separation (two hours). The 
summation is over all pointings at a target (Ep) and 
all secondary calibrators within two hours of a target 
pointing (E5). The Wt,s are used to select suitable sec- 
ondary calibrators for the respective targets from the 
calibrators whose wt,3 weights dominate for a particu- 
lar target. 

This procedure constructs the metadata required 
for continuum imaging at centimetre wavelengths. The 
algorithm works well in general, but there are some 
problematic cases, for example where the target is a 
source from the secondary calibrator catalogue. 

5.3 Implementation 

The underlying processing of the ATCA data is car- 
ried out using the Glish scripting language in AIPS++. 
The ATOA imaging Web Services interface was con- 
structed using the Apache Axis tools (ver. 1.1)^, and 
interfaces to the processing scripts through a Perl (ver. 
5.4.8) script that deals with the control of the exe- 
cution of the Glish scripts. 

The pipeline client is written using Python (ver. 
2.3)", and the SOAPpy web services tools (ver. 0.11.3)^^ 
There were some minor, but difficult to find, prob- 
lems in interoperation between the SOAPpy tools and 
Apache Axis; the data structures used in the web ser- 
vices calls are possibly more complicated than had 
been previously used between the two web services 
implementations. Documentation in both was not as 
informative or complete as it might have been. 

The pipeline web services can be configured to run 
directly on the server host, or be directed to run on 
other machines through a batch queuing system, since 
some stages in the pipeline can run for several CPU 



' http : //ws . apache . org/ axls| 
-^^ http : //www. perl . com| 
^ ^ http : //python . org 
http : //pywebsvcs . sourcef orge .net I 



minutes. We used the OpenPBS Batch Queuing Sys- 
tem (ver. 2.3)^^ for queue management, but unfortu- 
nately it has no mechanism for reporting job comple- 
tion to another program. After processing for a web 
service completes, the batch job doing the processing 
sends a completion message to the program invoked 
by the web service that controls the execution of the 
processing for the service. However, at this point, the 
batch processing system has not yet transferred the 
job's output data back to the pipeline server. The 
control program then polls the PBS batch queue at 
five second intervals to ensure that the batch job has 
completed. 

The raw data from the ATOA, all the intermediate 
files from the data processing, the log files, and the re- 
sulting images are stored temporarily on the pipeline 
server. The first web service call made by a pipeline 
client reserves a private location for storage, and re- 
quests a lifetime for the storage. The pipeline server 
has a configurable maximum lifetime, and the stored 
data will be deleted after this time expires. Only 
clients who have the name of the storage area (a ran- 
domly generated string) can access it. There is no 
quota on the storage use of any individual temporary 
storage area. However, a quota may be imposed on the 
total amount of storage available to all active storage 



ATOA 
metadata 



ATOA query server 



ATOA 
raw data 



ATOA file server 



Pipeline server 



Pipeline client 



Session state 



Remote Visualisation 
System 



Figure 2: System architecture. This schematic 
shows the relationship between the three tools we 
have developed. 



The ATOA and pipeline web services return a URL 
for the generated images to the end-user's system. This 
allows the URL to be passed on to the Remote Visu- 
alisation System (see SectionlSJ image viewing system 
so that the image can be viewed online while it is still 
resident on the pipeline server. Figure|21show the over- 
all system architecture, in particular how the ATOA, 
pipeline and RVS interact. 



\http : //www . openpbs .org! 
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6 The Remote Visualisation 
System 

The Remote Visualisation System (RVS) was designed 
to enable visualisation of and interaction with large 
astronomical images in the context of the VO. As op- 
posed to_other_VOimaM displays, such as CDS Al- 
adin jFerniaue et al 1L2004), RVS does not require the 
user to download the data to the client machine. Fur- 
thermore it provides rendering of image cubes, such as 
spectral-line cubes created from ATCA data. The RVS 
server accepts FITS images - which can be compressed 
- through local file:// URLs and remote http or ftp 
access. The data should be co-located with or at least 
be available to the server on high bandwidth connec- 
tion, while it places no such requirements on the client. 
Only minimal data transfer to the client is necessary 
and this is independent of the size of the source data 
set. The server-side architecture is distributed to en- 
able workload sharing and extensibility. RVS makes 
use of several software components: CORBA to make 
it distributed, AIPS-I--I- as the image rendering com- 
ponent and Java for the web services and client. The 
architecture of the RVS system is shown in Figure |3 




CORBA 



Figure 3: RVS system architecture. 



RVS is exposed through a web service interface us- 
ing the standard Web Service Description Language 
(WSDL ver. 1.1)". This can easily be integrated into 
custom applications. Several client applications make 
use of the web service interface; the RVSViewer - a 
traditional image viewer, a thumbnail service - provid- 
ing preview images and a session viewer. The session 
viewer connects to an existing RVSViewer via a key. 
Multiple instances can be run at the same time, mak- 
ing it a possible to use it as a conferencing tool where 
people can observe and interact with the data. The 
ATOA pipeline re-uses the existing RVSViewer client 
by passing it the file location of the output image. 

RVS is not specific to the ATOA or prototype pipeline 
and there are plans to use it for all ATCA archives. It 
has been successfully tested on images and data cubes 
from various surveys and has good performance on 

http : //www . w3 . org/TR/wsdl ^ 



large datasets. For example a 1.5 G b data cube from 
the G alactic All-Sky Survey (CASS) JMcClure-GrifBths et all 
l200,'^ l takes about one minute to load. Compare this 
with downloading the full cube from say the U.S. to 
Australia which could take ~ 1 — 2 hours. For more in- 
formation and direct access to RVS, see http : / /www . atnf . csiro . au/vo/ 

7 Discussion 

The ATOA has been public since December 2004. We 
hope that it will encourage the reuse of ATCA data 
for projects other than those it was originally intended 
for. The framework used for the ATOA could easily 
be extended to include data from other telescopes and 
can be updated as additional metadata is required. 

The most significant improvement of the ATOA 
over existing online archives (such as NRAO^^ and 
MAST^^) is the data delivery mechanism. Most ex- 
isting archives do not support on demand delivery of 
data over the web, instead requiring the user to sub- 
mit a form requesting files that then have to be trans- 
fered to a publicly accessible ftp site or to other media 
(such as CD) for physical delivery. In the ATOA, the 
batch downloading of multiple files is handled by a 
streaming TAR or ZIP archiving algorithm that per- 
forms dynamic archiving as files are streamed over the 
web, requiring no additional disk space on the server 
for these operations. 

In developing the ATCA data model and consider- 
ing the type of metadata required for automatic pro- 
cessing we identified several new metadata types that 
would be useful to store in the RPFITS files. As a 
result the following fields have been added to the RP- 
FITS files and will be available in aU future ATCA 
data: 

• four calibrator codes 

C (standard phase calibrator) 
F (primary flux calibrator) 
B (bandpass calibrator) 
P (pointing calibrator) 

• Pointing ofi^sets 

• Weather data: added rain gauge and phase rms 
and difference 

• Attenuator settings at start of scan 

• Subreflector position 

• Correlator configuration 

• Scan type 

• Coordinate type 

• Line mode 

• CACAL counter 

These will help both automatic processing systems and 
astronomers assess the data quality in the observations 
they are interested in. A full e-logbook system will be 
used in the future as currently the logs are all stored 

http : //archive . nrao . edu/ar chive / e26archive ■ j sp| 
^^|http : //archive . stsci . edu/ 1 
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on paper at the telescope and hence are not easily ac- 
cessible to ATOA users. 

We have developed a prototype pipeline for pro- 
cessing of raw data for single-pointing continuum im- 
ages. This is attached to the ATOA to provide an 
improved service for users of the ATOA. At this stage 
the image quality is suitable for previewing the data 
in archive to see if it is of interest. Further manual 
processing would then be required to obtain images of 
scientific quality. 

A significant challenge in developing the ATOA 
and the prototype pipeline were integrating pre-existing 
software with modern software tools. For example, 
the Glish scripting language has no web service li- 
braries and so an extra layer had to be developed be- 
tween the data processing level and the web services. 
If re-implementing from scratch, a language such as 
Python would be a better alternative for developing 
the pipeline. 

In developing these tools we have started to ex- 
plore the techniques necessary for astronomical soft- 
ware development in the VO era. This is essential for 
future telescopes and surveys that Australia will pro- 
duce. Making access to existing Australian data as 
easy as possible will maximise its use in the interna- 
tional community. 
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Figure 1: A portion of the data model for the ATCA. For an explanation of the no- 
tation see Section 14.21 The complete data model is available from the ATOA website: 
T7/www. atnf ■ csiro . au/ computing/web/atoa/ implementation .htmll 



